Session 4 of the Multiplexed Tissue Imaging Workshop focuses on single-cell analysis and pre-prcessing. The first section will demonstrate how to perform spillover correction before highlighting a batch correction approach and cell phenotyping.

Spillover correction

Small levels of channel-to-channel spillover occurs specifically in IMC which can be corrected for assuming a linear relationship between the signal of one channel and the signal of the neighboring channel.

This section highlights how to generate a spillover matrix from individually acquired single metal spots on an agarose slide. Each spot needs to be imaged as its own acquisition/ROI and individual TXT files containing the pixel intensities per spot need to be available. For complete details on the spillover correction approach, please refer to the original publication

When downloading the data you retrieved an MCD file which contains individually measured metal spots.

In brief, the highlighted workflow comprises 5 steps:

  1. Reading in the data
  2. Quality control
  3. (Optional) pixel binning
  4. “Debarcoding” for pixel assignment
  5. Pixel selection for spillover matrix estimation
  6. Spillover matrix generation
  7. Saving the results
  8. Single-cell compensation
  9. Image compensation

Generate the spillover matrix

In the first step, we will generate a spillover matrix based on the single-metal spots and save it for later use.

Read in the data

Here, we will read in the individual TXT files into a SingleCellExperiment object. This object can be used directly by the CATALYST package to estimate the spillover.

For this to work, the .txt file names need to contain the spotted metal isotope name. By default, the first occurrence of the isotope in the format (mt)(mass) (e.g. Sm152 for Samarium isotope with the atomic mass 152) will be used as spot identifier. Alternatively, a named list of already read-in pixel intensities can be provided. For more inforation, please refer to the man page ?readSCEfromTXT.

For further downstream analysis, we will asinh-transform the data using a cofactor of 5; a common transformation for CyTOF data.

library(imcRtools)
## Warning: package 'imcRtools' was built under R version 4.2.2
## Warning: package 'S4Vectors' was built under R version 4.2.2
## Warning: package 'GenomeInfoDb' was built under R version 4.2.2
# Create SingleCellExperiment from .txt files
sce <- readSCEfromTXT("data/compensation/") 
## Spotted channels:  Y89, In113, In115, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176
## Acquired channels:  Ar80, Y89, In113, In115, Xe131, Xe134, Ba136, La138, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176, Ir191, Ir193, Pt196, Pb206
## Channels spotted but not acquired:  
## Channels acquired but not spotted:  Ar80, Xe131, Xe134, Ba136, La138, Ir191, Ir193, Pt196, Pb206
assay(sce, "exprs") <- asinh(counts(sce)/5)

Quality control

In the next step, we will observe the median pixel intensities per spot and threshold on medians < 200 counts. These types of visualization serve two purposes:

  1. Small median pixel intensities (< 200 counts) might hinder the robust estimation of the channel spillover. In that case, consecutive pixels can be summed (see Optional pixel binning).

  2. Each spotted metal (row) should show the highest median pixel intensity in its corresponding channel (column). If this is not the case, either the naming of the .txt files was incorrect or the incorrect metal was spotted.

# Log10 median pixel counts per spot and channel
plotSpotHeatmap(sce)

# Thresholded on 200 pixel counts
plotSpotHeatmap(sce, log = FALSE, threshold = 200)

As we can see, all median pixel intensities are > 200 counts for each spot. We also observe acquired channels for which no spot was placed (Xe134, Ir191, Ir193).

Optional pixel binning

In cases where median pixel intensities are low (< 200 counts), consecutive pixels can be summed to increase the robustness of the spillover estimation. The imcRtools package provides the binAcrossPixels function, which performs aggregation for each channel across bin_size consecutive pixels per spotted metal.

# Define grouping
bin_size = 10

sce2 <- binAcrossPixels(sce, bin_size = bin_size)

# Log10 median pixel counts per spot and channel
plotSpotHeatmap(sce2)

# Thresholded on 200 pixel counts
plotSpotHeatmap(sce2, log = FALSE, threshold = 200)

Here, we can see an increase in the median pixel intensities and accumulation of off-diagonal signal. Due to already high original pixel intensities, we will refrain from aggregating across consecutive pixels for this demonstration.

Filtering incorrectly assigned pixels

The following step uses functions provided by the CATALYST package to “debarcode” the pixels. Based on the intensity distribution of all channels, pixels are assigned to their corresponding barcode; here this is the already known metal spot. This procedure serves the purpose to identify pixels that cannot be robustly assigned to the spotted metal. Pixels of such kind can be regarded as “noisy”, “background” or “artefacts” that should be removed prior to spillover estimation.

We will also need to specify which channels were spotted (argument bc_key). This information is directly contained in the colData(sce) slot. To facilitate visualization, we will order the bc_key by mass.

The general workflow for pixel debarcoding is as follows:

  1. assign a preliminary metal mass to each pixel
  2. for each pixel, estimate a cutoff parameter for the distance between positive and negative pixel sets
  3. apply the estimated cutoffs to identify truly positive pixels
library(CATALYST)

bc_key <- as.numeric(unique(sce$sample_mass))
bc_key <- bc_key[order(bc_key)]

sce <- assignPrelim(sce, bc_key = bc_key)
sce <- estCutoffs(sce)
sce <- applyCutoffs(sce)

The obtained SingleCellExperiment now contains the additional bc_id entry. For each pixel, this vector indicates the assigned mass (e.g. 161) or 0, meaning unassigned.

This information can be visualized in form of a heatmap:

library(pheatmap)
cur_table <- table(sce$bc_id, sce$sample_mass)

pheatmap(log10(cur_table + 1),
         cluster_rows = FALSE,
         cluster_cols = FALSE)

We can see here, that all pixels were assigned to the right mass and that all pixel sets are made up of > 1000 pixels.

However, in cases where incorrect assignment occurred or where few pixels were measured for some spots, the imcRtools package exports a simple helper function to exclude pixels based on these criteria:

sce <- filterPixels(sce, minevents = 40, correct_pixels = TRUE)

Compute spillover matrix

Based on the single-positive pixels, we use the CATALYST::computeSpillmat() function to compute the spillover matrix and CATALYST::plotSpillmat() to visualize it. The plotSpillmat function checks the spotted and acquired metal isotopes against a pre-defined CATALYST::isotope_list(). In this data, the Ar80 channel was additionally acquired to check for deviations in signal intensity. Ar80 needs to be added to a custom isotope_list object for visualization.

sce <- computeSpillmat(sce)

isotope_list <- CATALYST::isotope_list
isotope_list$Ar <- 80

plotSpillmat(sce, isotope_list = isotope_list)
## Warning: The `guide` argument in `scale_*()` cannot be `FALSE`. This was deprecated in
## ggplot2 3.3.4.
## ℹ Please use "none" instead.
## ℹ The deprecated feature was likely used in the CATALYST package.
##   Please report the issue at <]8;;https://github.com/HelenaLC/CATALYST/issueshttps://github.com/HelenaLC/CATALYST/issues]8;;>.

# Save spillover matrix in new object
sm <- metadata(sce)$spillover_matrix

Of note: the visualization of the spillover matrix using CATALYST does currently not visualize spillover between the larger channels. In this case, the spillover matrix is clipped at Yb171.

As we can see, the largest spillover appears in In113 --> In115 and we also observe the +16 oxide impurities for e.g. Nd148 --> Dy164.

We can save the spillover matrix for external use.

write.csv(sm, "data/sm.csv")

Single-cell data compensation

The CATALYST package can be used to perform spillover compensation on the single-cell mean intensities. Here, the SpatialExperiment object generated in Section @ref(read-data) is read in. The CATALYST package requires an entry to rowData(spe)$channel_name for the compCytof function to run. This entry should contain the metal isotopes in the form (mt)(mass)Di (e.g., Sm152Di for Samarium isotope with the atomic mass 152).

The compCytof function performs channel spillover compensation on the mean pixel intensities per channel and cell. Here, we will not overwrite the assays in the SpatialExperiment object to later highlight the effect of compensation. As shown in Section @ref(read-data), also the compensated counts are asinh-transformed using a cofactor of 1.

spe <- readRDS("data/spe.rds")
rowData(spe)$channel_name <- paste0(rowData(spe)$channel, "Di")

spe <- compCytof(spe, sm, 
                 transform = TRUE, cofactor = 1,
                 isotope_list = isotope_list, 
                 overwrite = FALSE)

To check the effect of channel spillover compensation, the expression of markers that are affected by spillover (e.g., E cadherin in channel Yb173 and CD303 in channel Yb174) can be visualized in form of scatter plots before and after compensation.

library(dittoSeq)
library(patchwork)
before <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
                           assay.x = "exprs", assay.y = "exprs") +
    ggtitle("Before compensation")

after <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
                          assay.x = "compexprs", assay.y = "compexprs") +
    ggtitle("After compensation")
before + after

We observe that the spillover Yb173 –> Yb174 was successfully corrected. To facilitate further downstream analysis, the non-compensated assays can now be replaced by their compensated counterparts:

assay(spe, "counts") <- assay(spe, "compcounts") 
assay(spe, "exprs") <- assay(spe, "compexprs") 
assay(spe, "compcounts") <- assay(spe, "compexprs") <- NULL

Image compensation

The cytomapper package allows channel spillover compensation directly on multi-channel images. The compImage function takes a CytoImageList object and the estimated spillover matrix as input. More info on how to work with CytoImageList objects can be seen in Section @ref(image-visualization).

At this point, we can read in the CytoImageList object containing multi-channel images as generated in Section @ref(read-data). The channelNames need to be set according to their metal isotope in the form (mt)(mass)Di and therefore match colnames(sm).

library(cytomapper)

images <- readRDS("data/images.rds")
masks <- readRDS("data/masks.rds")
channelNames(images) <- rowData(spe)$channel_name

The CATALYST package provides the adaptSpillmat function that corrects the spillover matrix in a way that rows and columns match a predefined set of metals. Please refer to ?compCytof for more information how metals in the spillover matrix are matched to acquired channels in the SingleCellExperiment object.

The spillover matrix can now be adapted to exclude channels that are not part of the measurement (keep == 0).

library(tiff)
panel <- read.csv("data/steinbock/panel.csv")
adapted_sm <- adaptSpillmat(sm, paste0(panel$channel[panel$keep == 1], "Di"), 
                            isotope_list = isotope_list)
## Compensation is likely to be inaccurate.
## Spill values for the following interactions
## have not been estimated:
## Ir191Di -> Ir193Di
## Ir193Di -> Ir191Di

The adpated spillover matrix now matches the channelNames of the CytoImageList object and can be used to perform pixel-level spillover compensation.

library(BiocParallel)
## Warning: package 'BiocParallel' was built under R version 4.2.2
library(parallel)

images_comp <- compImage(images, adapted_sm, 
                         BPPARAM = MulticoreParam())

As a sanity check, we will visualize the image before and after compensation:

# Before compensation
plotPixels(images[5], colour_by = "Yb173Di", 
           image_title = list(text = "Yb173 (Ecad) - before", position = "topleft"), 
           legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))

plotPixels(images[5], colour_by = "Yb174Di", 
           image_title = list(text = "Yb174 (CD303) - before", position = "topleft"), 
           legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))

# After compensation
plotPixels(images_comp[5], colour_by = "Yb173Di",
           image_title = list(text = "Yb173 (Ecad) - after", position = "topleft"), 
           legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))

plotPixels(images_comp[5], colour_by = "Yb174Di", 
           image_title = list(text = "Yb174 (CD303) - after", position = "topleft"),
           legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))

For convenience, we will re-set the channelNames to their biological targtes:

channelNames(images_comp) <- rownames(spe)

Batch correction

In the previous Session we observed staining/expression differences between the individual samples. This can arise due to technical (e.g., differences in sample processing) as well as biological (e.g. differential expression between patients/indications) reasons. However, the combination of these effects may hinder cell phenotyping via clustering.

To integrate cells across samples, we can use computational strategies developed for correcting batch effects in single-cell RNA sequencing data. Computational tools in R that perform those corrections include batchelor, harmony and Seurat. Due to time constrains, we only highlight the batchelor package for batch correction.

Of note: the correction approaches presented here aim at removing any differences between samples. This will also remove biological differences between the patients/indications. Nevertheless, integrating cells across samples can facilitate the detection of cell phenotypes via clustering.

fastMNN correction

The batchelor package provides the mnnCorrect and fastMNN functions to correct for differences between samples/batches. Both functions build up on finding mutual nearest neighbors (MNN) among the cells of different samples and correct expression differences between the cells. The mnnCorrect function returns corrected expression counts while the fastMNN functions performs the correction in reduced dimension space. As such, fastMNN returns integrated cells in form of a low dimensional embedding.

Paper: Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
Documentation: batchelor

Perform sample correction

Here, we apply the fastMNN function to integrate cells between patients. By setting auto.merge = TRUE the function estimates the best batch merging order by maximizing the number of MNN pairs at each merging step. This is more time consuming than merging sequentially based on how batches appear in the dataset (default). We again select the markers defined in Section @ref(cell-processing) for sample correction.

The function returns a SingleCellExperiment object which contains corrected low-dimensional coordinates for each cell in the reducedDim(out, "corrected") slot. This low-dimensional embedding can be further used for clustering and non-linear dimensionality reduction. We transfer the corrected coordinates to the main SpatialExperiment object.

library(batchelor)
set.seed(220228)
out <- fastMNN(spe, batch = spe$patient_id,
               auto.merge = TRUE,
               subset.row = rowData(spe)$use_channel,
               assay.type = "exprs")

# Transfer the correction results to the main spe object
reducedDim(spe, "fastMNN") <- reducedDim(out, "corrected")

Quality control of correction results

The fastMNN function further returns outputs that can be used to assess the quality of the batch correction. The metadata(out)$merge.info entry collects diagnostics for each individual merging step. Here, the batch.size and lost.var entries are important. The batch.size entry reports the relative magnitude of the batch effect and the lost.var entry represents the percentage of lost variance per merging step. A large batch.size and low lost.var indicate sufficient batch correction.

merge_info <- metadata(out)$merge.info 

DataFrame(left = merge_info$left,
          right = merge_info$right,
          batch.size = merge_info$batch.size,
          max_lost_var = rowMax(merge_info$lost.var))
## DataFrame with 3 rows and 4 columns
##                         left    right batch.size max_lost_var
##                       <List>   <List>  <numeric>    <numeric>
## 1                   Patient4 Patient2   0.376540    0.0456565
## 2          Patient4,Patient2 Patient1   0.552928    0.0420837
## 3 Patient4,Patient2,Patient1 Patient3   0.753851    0.0744367

We observe that Patient4 and Patient2 are most similar with a low batch effect. Merging cells of Patient3 into the combined batch of Patient1, Patient2 and Patient4 resulted in the highest percentage of lost variance and the detection of the largest batch effect. In the next paragraph we can visualize the correction results.

Visualization

The simplest option to check if the sample effects were corrected is by using non-linear dimensionality reduction techniques and observe mixing of cells across samples. We will recompute the UMAP embedding using the corrected low-dimensional coordinates for each cell.

library(scater)

set.seed(220228)
spe <- runUMAP(spe, dimred= "fastMNN", name = "UMAP_mnnCorrected") 

Next, we visualize the corrected UMAP while overlaying patient IDs.

library(cowplot)
library(dittoSeq)
library(viridis)

# visualize patient id 
p1 <- dittoDimPlot(spe, var = "patient_id", 
                   reduction.use = "UMAP", size = 0.2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
    ggtitle("Patient ID on UMAP before correction")
p2 <- dittoDimPlot(spe, var = "patient_id", 
                   reduction.use = "UMAP_mnnCorrected", size = 0.2) + 
    scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
    ggtitle("Patient ID on UMAP after correction")

plot_grid(p1, p2)

We observe an imperfect merging of Patient3 into all other samples. This was already seen when displaying the merging information above. We now also visualize the expression of selected markers across all cells before and after batch correction.

markers <- c("Ecad", "CD45RO", "CD20", "CD3", "FOXP3", "CD206", "MPO", "SMA", "Ki67")

# Before correction
plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP", 
                   assay = "exprs", size = 0.2, list.out = TRUE) 
plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
plot_grid(plotlist = plot_list) 

# After correction
plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP_mnnCorrected", 
                   assay = "exprs", size = 0.2, list.out = TRUE) 
plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
plot_grid(plotlist = plot_list) 

We observe that immune cells across patients are merged after batch correction using fastMNN. However, the tumor cells of different patients still cluster separately.

Cell phenotyping

A common step during single-cell data analysis is the annotation of cells based on their phenotype. Defining cell phenotypes is often subjective and relies on previous biological knowledge. The Orchestrating Single Cell Analysis with Bioconductor book presents a number of approaches to phenotype cells detected by single-cell RNA sequencing based on reference datasets or prior knowledge of geneset importance.

In highly-multiplexed imaging, target proteins or molecules are manually selected based on the biological question at hand. It narrows down the feature space and facilitates the manual annotation of clusters to derive cell phenotypes. We will therefore highlight the use of one clustering approache to group cells based on their similarity in marker expression.

Unlike single-cell RNA sequencing or CyTOF data, single-cell data derived from highly-multiplexed imaging data often suffers from “lateral spillover” between neighboring cells. This spillover caused by imperfect segmentation often hinders accurate clustering to define specific cell phenotypes in multiplexed imaging data. In Section @ref(classification) we will train and apply a random forest classifier to classify cell phenotypes in the dataset as alternative approach for clustering-based cell phenotyping. This approach has been previously used to identify major cell phenotypes in metastatic melanoma and avoids clustering of cells.

We will first sample 2000 cells to visualize cluster membership.

# Sample cells
set.seed(220619)
cur_cells <- sample(seq_len(ncol(spe)), 2000)

Clustering approaches

In the first section to identify cellular phenotypes in the dataset, we will present a popular clustering approache that groups cells based on their similarity in marker expression. A number of approaches have been developed to cluster data derived from single-cell RNA sequencing technologies [@Yu2022] or CyTOF [@Weber2016]. For demonstration purposes, we will only highlight the Phenograph clustering approach. For a full overview on different clustering approaches, please refer to the Phenotyping section of the online book.

Rphenograph

The PhenoGraph clustering approach was first described to group cells of a CyTOF dataset [@Levine2015]. The algorithm first constructs a graph by detecting the k nearest neighbours based on euclidean distance in expression space. In the next step, edges between nodes (cells) are weighted by their overlap in nearest neighbor sets. To quantify the overlap in shared nearest neighbor sets, the jaccard index is used. The Louvain modularity optimization approach is used to detect connected communities and partition the graph into clusters of cells. This clustering strategy was used by Jackson, Fischer et al. and Schulz et al. to cluster IMC data [@Jackson2020; @Schulz2018].

There are several different PhenoGraph implementations available in R. Here, we use the one available at https://github.com/i-cyto/Rphenograph. For large datasets, https://github.com/stuchly/Rphenoannoy offers a more performant implementation of the algorithm.

In the following code chunk, we select the asinh-transformed mean pixel intensities per cell and channel and subset the channels to the ones containing biological variation. This matrix is transposed to store cells in rows. Within the Rphenograph function, we select the 45 nearest neighbors for graph building and louvain community detection (default). The function returns a list of length 2, the first entry being the graph and the second entry containing the community object. Calling membership on the community object will return cluster IDs for each cell. These cluster IDs are then stored within the colData of the SpatialExperiment object. Cluster IDs are mapped on top of the UMAP embedding and single-cell marker expression within each cluster are visualized in form of a heatmap.

It is recommended to test different inputs to k as shown in the next section. Selecting larger values for k results in larger clusters.

library(Rphenograph)
library(igraph)
library(dittoSeq)
library(viridis)

mat <- t(assay(spe, "exprs")[rowData(spe)$use_channel,])

out <- Rphenograph(mat, k = 45)

clusters <- factor(membership(out[[2]]))

spe$pg_clusters <- clusters

dittoDimPlot(spe, var = "pg_clusters", 
             reduction.use = "UMAP", size = 0.2,
             do.label = TRUE) +
    ggtitle("Phenograph clusters expression on UMAP")

dittoHeatmap(spe[,cur_cells], 
             genes = rownames(spe)[rowData(spe)$use_channel],
             assay = "exprs", scale = "none",
             heatmap.colors = viridis(100), 
             annot.by = c("pg_clusters", "patient_id"),
             annot.colors = c(dittoColors(1)[1:length(unique(spe$pg_clusters))],
                              metadata(spe)$color_vectors$patient_id))

We can observe that some of the clusters only contain cells of a single patient. This can often be observed in the tumor compartment. In the next step, we use the integrated cells (see Section @ref(batch-effects)) in low dimensional embedding for clustering. Here, the low dimensional embedding can be directly accessed from the reducedDim slot.

mat <- reducedDim(spe, "fastMNN")

out <- Rphenograph(mat, k = 45)

clusters <- factor(membership(out[[2]]))

spe$pg_clusters_corrected <- clusters

dittoDimPlot(spe, var = "pg_clusters_corrected", 
             reduction.use = "UMAP_mnnCorrected", size = 0.2,
             do.label = TRUE) +
    ggtitle("Phenograph clusters expression on UMAP, integrated cells")

dittoHeatmap(spe[,cur_cells], 
             genes = rownames(spe)[rowData(spe)$use_channel],
             assay = "exprs", scale = "none",
             heatmap.colors = viridis(100), 
             annot.by = c("pg_clusters_corrected", "pg_clusters","patient_id"),
             annot.colors = c(dittoColors(1)[1:length(unique(spe$pg_clusters_corrected))],
                              dittoColors(1)[1:length(unique(spe$pg_clusters))],
                              metadata(spe)$color_vectors$patient_id))

Clustering using the integrated embedding leads to clusters that contain cells of different patients. Cluster annotation can now be performed by manually labeling cells based on their marker expression.

Classification approach

In this section, we will highlight a cell type classification approach based on ground truth labeling and random forest classification. The rational for this supervised cell phenotyping approach is to use the information contained in the pre-defined markers to detect cells of interest. This approach was used by Hoch et al. (2022) to classify cell types in a metastatic melanoma IMC dataset.

The antibody panel used in the example data set mainly focuses on immune cell types and little on tumor cell phenotypes. Therefore we will label the following cell types:

  • Tumor (E-cadherin positive)
  • Stroma (SMA, PDGFRb positive)
  • Plasma cells (CD38 positive)
  • Neutrophil (MPO, CD15 positive)
  • Myeloid cells (HLADR positive)
  • B cells (CD20 positive)
  • B next to T cells (CD20, CD3 positive)
  • Regulatory T cells (FOXP3 positive)
  • CD8+ T cells (CD3, CD8 positive)
  • CD4+ T cells (CD3, CD4 positive)

The “B next to T cell” phenotype (BnTcell) is commonly observed in immune infiltrated regions measured by IMC. We include this phenotype to account for B cell/T cell interactions where precise classification into B cells or T cells is not possible.

As related approaches, Astir and Garnett use pre-defined panel information to classify cell phenotypes based on their marker expression.

Manual labeling of cells

The cytomapper package provides the cytomapperShiny function that allows gating of cells based on their marker expression and visualization of selected cells directly on the images.

library(cytomapper)
if (interactive()) {
    cytomapperShiny(object = spe, mask = masks, image = images_comp, 
                    cell_id = "ObjectNumber", img_id = "sample_id")
}

The labeled cells for this data set can be accessed at zenodo.org/record/7432486 and were downloaded previously. Per image, the cytomapperShiny function allows the export of gated cells in form of a SingleCellExperiment or SpatialExperiment object. The cell label is stored in colData(object)$cytomapper_CellLabel and the gates are stored in metadata(object). In the next section, we will read in and consolidate the labeled data.

Define color vectors

For consistent visualization of cell types, we will now pre-define their colors:

celltype <- setNames(c("#3F1B03", "#F4AD31", "#894F36", "#1C750C", "#EF8ECC", 
                       "#6471E2", "#4DB23B", "grey", "#F4800C", "#BF0A3D", "#066970"),
                     c("Tumor", "Stroma", "Myeloid", "CD8", "Plasma_cell", 
                       "Treg", "CD4", "undefined", "BnTcell", "Bcell", "Neutrophil"))

metadata(spe)$color_vectors$celltype <- celltype

Read in and consolidate labeled data

Here, we will read in the individual SpatialExperiment objects containing the labeled cells and concatenate them. In the process of concatenating the SpatialExperiment objects along their columns, the sample_id entry is appended by .1, .2, .3, ... due to replicated entries.

library(SingleCellExperiment)
label_files <- list.files("data/gated_cells", 
                          full.names = TRUE, pattern = ".rds$")

# Read in SPE objects
spes <- lapply(label_files, readRDS)

# Merge SPE objects
concat_spe <- do.call("cbind", spes)

In the following code chunk we will identify cells that were labeled multiple times. This occurs when different cell phenotypes are gated per image and can affect immune cells that are located inside the tumor compartment.

We will first identify those cells that were uniquely labeled. In the next step, we will identify those cells that were labeled twice AND were labeled as Tumor cells. These cells will be assigned their immune cell label. Finally, we will save the unique labels within the original SpatialExperiment object.

cur_tab <- unclass(table(colnames(concat_spe), 
                         concat_spe$cytomapper_CellLabel))
cur_labels <- rep("doublets", nrow(cur_tab))
names(cur_labels) <- rownames(cur_tab)

# Single assignments
single_index <- rowSums(cur_tab) == 1
cur_labels[single_index] <- colnames(cur_tab)[apply(cur_tab[single_index,], 1, 
                                                    which.max)]

# Double assignment within the tumor
double_index <- rowSums(cur_tab) == 2 & cur_tab[,"Tumor"] == 1
no_tumor <- cur_tab[,colnames(cur_tab) != "Tumor"]
cur_labels[double_index] <- colnames(no_tumor)[apply(no_tumor[double_index,], 1, 
                                                    which.max)]

# Remove doublets
cur_labels <- cur_labels[cur_labels != "doublets"]
table(cur_labels)
## cur_labels
##       Bcell     BnTcell         CD4         CD8     Myeloid  Neutrophil 
##         780        1702         688         822        1750         134 
## Plasma_cell      Stroma        Treg       Tumor 
##         716         442         361        5999
# Transfer labels to SPE object
spe_labels <- rep("unlabeled", ncol(spe))
names(spe_labels) <- colnames(spe)
spe_labels[names(cur_labels)] <- cur_labels
spe$cell_labels <- spe_labels

# Number of cells labeled per patient
table(spe$cell_labels, spe$patient_id)
##              
##               Patient1 Patient2 Patient3 Patient4
##   Bcell            152      131      234      263
##   BnTcell          396       37      240     1029
##   CD4               45      342      167      134
##   CD8               60      497      137      128
##   Myeloid          183      378      672      517
##   Neutrophil        97        4       17       16
##   Plasma_cell       34      536       87       59
##   Stroma            84       37       85      236
##   Treg             139      149       49       24
##   Tumor           2342      906     1618     1133
##   unlabeled       7214     9780     7826     9580

Based on these labels, we can now train a random forest classifier to classify all remaining, unlabeled cells.

Train classifier

In this section, we will use the caret framework for machine learning in R. This package provides an interface to train a number of regression and classification models in a coherent fashion. We use a random forest classifier due to low number of parameters, high speed and an observed high performance for cell type classification [@Hoch2022].

In the following section, we will first split the SpatialExperiment object into labeled and unlabeled cells. Based on the labeled cells, we split the data into a train (75% of the data) and test (25% of the data) dataset. We currently do not provide an independently labeled validation dataset.

The caret package provides the trainControl function, which specifies model training parameters and the train function, which performs the actual model training. While training the model, we also want to estimate the best model parameters. In the case of the chosen random forest model (method = "rf"), we only need to estimate a single parameters (mtry) which corresponds to the number of variables randomly sampled as candidates at each split. To estimate the best parameter, we will perform a 5-fold cross validation (set within trainControl) over a tune length of 5 entries to mtry.

library(caret)

# Split between labeled and unlabeled cells
lab_spe <- spe[,spe$cell_labels != "unlabeled"]
unlab_spe <- spe[,spe$cell_labels == "unlabeled"]

# Randomly split into train and test data
set.seed(221029)
trainIndex <- createDataPartition(factor(lab_spe$cell_labels), p = 0.75)
train_spe <- lab_spe[,trainIndex$Resample1]
test_spe <- lab_spe[,-trainIndex$Resample1]

# Specify train parameters for 5-fold cross validation
fitControl <- trainControl(method = "cv",
                           number = 5)

# Select the data for training
cur_mat <- t(assay(train_spe, "exprs")[rowData(train_spe)$use_channel,])

# Train a random forest model for predicting cell labels
# This call also performs parameter tuning
rffit <- train(x = cur_mat, 
               y = factor(train_spe$cell_labels),
               method = "rf", ntree = 1000,
               tuneLength = 5,
               trControl = fitControl)

rffit
## Random Forest 
## 
## 10049 samples
##    37 predictor
##    10 classes: 'Bcell', 'BnTcell', 'CD4', 'CD8', 'Myeloid', 'Neutrophil', 'Plasma_cell', 'Stroma', 'Treg', 'Tumor' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 8040, 8039, 8038, 8038, 8041 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9674570  0.9565125
##   10    0.9766143  0.9688787
##   19    0.9774111  0.9699432
##   28    0.9781076  0.9708717
##   37    0.9766153  0.9688730
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 28.

Classifier performance

We next observe the accuracy of the classifer when predicting cell phenotypes across the cross-validation as well as the test dataset.

First, we can visualize the classification accuracy during parameter tuning:

ggplot(rffit) + 
  geom_errorbar(data = rffit$results,
                aes(ymin = Accuracy - AccuracySD,
                    ymax = Accuracy + AccuracySD),
                width = 0.4) +
    theme_classic(base_size = 15)

The best value for mtry is 28 and is used when predicting new data.

It is often recommended to visualize the variable importance of the classifier. The following plot specifies which variables (markers) are most important for classifying the data.

plot(varImp(rffit))

As expected, the markers that were used for gating (Ecad, CD3, CD20, HLADR, CD8a, CD38, FOXP3) were important for classification.

To assess the accuracy, sensitivity, specificity, among other quality measures of the classifier, we will now predict cell phenotypes in the test data.

# Select test data
cur_mat <- t(assay(test_spe, "exprs")[rowData(test_spe)$use_channel,])

# Predict cell phenotypes in test data
cur_pred <- predict(rffit, 
                    newdata = cur_mat)

While the overall classification accuracy can appear high, we also want to check if each cell phenotype class is correctly predicted. For this, we will calculate the confusion matrix between predicted and actual cell labels. This measure highlights individual cell phenotype classes that were not correctly predicted by the classifier. When setting mode = "everything", the confusionMatrix function returns all available prediction measures including sensitivity, specificity, precision, recall and the F1 score per cell phenotype class.

cm <- confusionMatrix(data = cur_pred, 
                      reference = factor(test_spe$cell_labels), 
                      mode = "everything")

cm
## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    Bcell BnTcell  CD4  CD8 Myeloid Neutrophil Plasma_cell Stroma
##   Bcell         187       2    0    0       0          0           7      0
##   BnTcell         3     423    1    0       0          0           0      0
##   CD4             0       0  161    0       0          2           3      2
##   CD8             0       0    0  198       0          0           8      0
##   Myeloid         0       0    2    2     437          0           0      0
##   Neutrophil      0       0    0    0       0         30           0      0
##   Plasma_cell     1       0    4    3       0          0         157      0
##   Stroma          0       0    2    0       0          0           0    108
##   Treg            0       0    0    0       0          0           3      0
##   Tumor           4       0    2    2       0          1           1      0
##              Reference
## Prediction    Treg Tumor
##   Bcell          0     0
##   BnTcell        0     0
##   CD4            0     4
##   CD8            0     5
##   Myeloid        0     0
##   Neutrophil     0     0
##   Plasma_cell    0     0
##   Stroma         0     0
##   Treg          90     2
##   Tumor          0  1488
## 
## Overall Statistics
##                                          
##                Accuracy : 0.9803         
##                  95% CI : (0.975, 0.9847)
##     No Information Rate : 0.4481         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.9737         
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: Bcell Class: BnTcell Class: CD4 Class: CD8
## Sensitivity               0.95897         0.9953    0.93605    0.96585
## Specificity               0.99714         0.9986    0.99653    0.99586
## Pos Pred Value            0.95408         0.9906    0.93605    0.93839
## Neg Pred Value            0.99746         0.9993    0.99653    0.99777
## Precision                 0.95408         0.9906    0.93605    0.93839
## Recall                    0.95897         0.9953    0.93605    0.96585
## F1                        0.95652         0.9930    0.93605    0.95192
## Prevalence                0.05830         0.1271    0.05142    0.06129
## Detection Rate            0.05590         0.1265    0.04813    0.05919
## Detection Prevalence      0.05859         0.1277    0.05142    0.06308
## Balanced Accuracy         0.97806         0.9970    0.96629    0.98086
##                      Class: Myeloid Class: Neutrophil Class: Plasma_cell
## Sensitivity                  1.0000          0.909091            0.87709
## Specificity                  0.9986          1.000000            0.99747
## Pos Pred Value               0.9909          1.000000            0.95152
## Neg Pred Value               1.0000          0.999095            0.99308
## Precision                    0.9909          1.000000            0.95152
## Recall                       1.0000          0.909091            0.87709
## F1                           0.9954          0.952381            0.91279
## Prevalence                   0.1306          0.009865            0.05351
## Detection Rate               0.1306          0.008969            0.04694
## Detection Prevalence         0.1318          0.008969            0.04933
## Balanced Accuracy            0.9993          0.954545            0.93728
##                      Class: Stroma Class: Treg Class: Tumor
## Sensitivity                0.98182     1.00000       0.9927
## Specificity                0.99938     0.99846       0.9946
## Pos Pred Value             0.98182     0.94737       0.9933
## Neg Pred Value             0.99938     1.00000       0.9940
## Precision                  0.98182     0.94737       0.9933
## Recall                     0.98182     1.00000       0.9927
## F1                         0.98182     0.97297       0.9930
## Prevalence                 0.03288     0.02691       0.4481
## Detection Rate             0.03229     0.02691       0.4448
## Detection Prevalence       0.03288     0.02840       0.4478
## Balanced Accuracy          0.99060     0.99923       0.9936

To easily visualize these results, we can now plot the true positive rate (sensitivity) versus the false positive rate (1 - specificity):

library(tidyverse)

data.frame(cm$byClass) %>%
  mutate(class = sub("Class: ", "", rownames(cm$byClass))) %>%
  ggplot() + 
  geom_point(aes(1 - Specificity, Sensitivity, 
                 size = Detection.Rate,
                 fill = class),
             shape = 21) + 
  scale_fill_manual(values = metadata(spe)$color_vectors$celltype) +
  theme_classic(base_size = 15) + 
  ylab("Sensitivity (TPR)") +
  xlab("1 - Specificity (FPR)")

We observe high sensitivity and specificity for most cell types. Plasma cells show the lowest true positive rate with 88% being sufficiently high. The size of the circle specifies the number of cells per class.

Finally, to observe which cell phenotypes were wrongly classified, we can visualize the distribution of classification probabilities per cell phenotype class:

cur_pred <- predict(rffit, 
                    newdata = cur_mat, 
                    type = "prob")
cur_pred$truth <- factor(test_spe$cell_labels)

cur_pred %>%
  pivot_longer(cols = Bcell:Tumor) %>%
  ggplot() +
  geom_boxplot(aes(x = name, y = value, fill = name), outlier.size = 0.5) +
  facet_wrap(. ~ truth, ncol = 1) + 
  scale_fill_manual(values = metadata(spe)$color_vectors$celltype)  +
  theme(panel.background = element_blank(), 
        axis.text.x = element_text(angle = 45, hjust = 1))

The boxplots indicate the classification probabilities per class. The classifier is well trained if classification probabilities are only high for the one specific class.

Classification of new data

In the final section, we will now use the tuned and tested random forest classifier to predict the cell phenotypes of the unlabeled data.

First, we predict the cell phenotypes and extract their classification probabilities.

# Select unlabeled data
cur_mat <- t(assay(unlab_spe, "exprs")[rowData(unlab_spe)$use_channel,])

# Predict cell phenotypes
cell_class <- as.character(predict.train(rffit, 
                                         newdata = cur_mat, 
                                         type = "raw"))
names(cell_class) <- rownames(cur_mat)

table(cell_class)
## cell_class
##       Bcell     BnTcell         CD4         CD8     Myeloid  Neutrophil 
##         808         985        3604        2674        5641         558 
## Plasma_cell      Stroma        Treg       Tumor 
##        3381        4718        1162       10869
# Extract classification probabilities
cell_prob <- predict.train(rffit, 
                           newdata = cur_mat, 
                           type = "prob")

Each cell is assigned to the class with highest probability. There are however cases, where the highest probability is low meaning the cell can not be uniquely assigned to a class. We next want to identify these cells and label them as “undefined”. Here, we select a maximum classification probability threshold of 40% but this threshold needs to be adjusted for other datasets. The adjusted cell labels are then stored in the SpatialExperiment object.

library(ggridges)

# Distribution of maximum probabilities
tibble(max_prob = rowMax(as.matrix(cell_prob)),
       type = cell_class) %>%
    ggplot() +
        geom_density_ridges(aes(x = max_prob, y = cell_class, fill = cell_class)) +
        scale_fill_manual(values = metadata(spe)$color_vectors$celltype) +
        theme_classic(base_size = 15) +
        xlab("Maximum probability") +
        ylab("Cell type") + 
        xlim(c(0,1.2))
## Picking joint bandwidth of 0.0206

# Label undefined cells
cell_class[rowMax(as.matrix(cell_prob)) < 0.4] <- "undefined"

# Store labels in SpatialExperiment onject
cell_labels <- spe$cell_labels
cell_labels[colnames(unlab_spe)] <- cell_class
spe$celltype <- cell_labels 

table(spe$celltype, spe$patient_id)
##              
##               Patient1 Patient2 Patient3 Patient4
##   Bcell            175      531      422      459
##   BnTcell          416      585      591     1086
##   CD4              421     1485      726     1557
##   CD8              506     1341      485     1149
##   Myeloid         1215     1910     1526     2657
##   Neutrophil       352        9      136      181
##   Plasma_cell      816     2483      421      341
##   Stroma           614      611      697     3223
##   Treg             550      416      256      293
##   Tumor           5592     3346     5817     2102
##   undefined         89       80       55       71

Cell phenotype visualization

After classifying cells based on their phenotype in the dataset we will need to perform quality control to ensure correct labelling.

First, we will visualize the cell labels on the UMAP embedding:

dittoDimPlot(spe, var = "celltype", 
             reduction.use = "UMAP_mnnCorrected", size = 0.2,
             do.label = TRUE) +
  scale_color_manual(values = metadata(spe)$color_vectors$celltype) +
  theme(legend.title = element_blank()) +
  ggtitle("Cell types on UMAP, integrated cells")

We can also visualize each cells expression in form of a heatmap while labelling the columns by cell phenotype.

#Heatmap visualization - DittoHeatmap
dittoHeatmap(spe[,cur_cells], 
             genes = rownames(spe)[rowData(spe)$use_channel],
             assay = "exprs", order.by = c("celltype"),
             cluster_cols = FALSE, scale = "none",
             heatmap.colors = viridis(100), annot.by = c("celltype","indication","patient_id"),
             annotation_colors = list(indication = metadata(spe)$color_vectors$indication,
                                      patient_id = metadata(spe)$color_vectors$patient_id,
                                      celltype = metadata(spe)$color_vectors$celltype))

We can now also visualize the differences in cell phenotypes between patients in form of barplots.

# by patient_id - percentage
dittoBarPlot(spe, var = "celltype", group.by = "patient_id") +
    scale_fill_manual(values = metadata(spe)$color_vectors$celltype)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.

The cytomapper package further provides functionality to visualize the cell phenotypes on segmentation masks. For visualization purposes, we select again three images.

# Sample images
set.seed(220517)
cur_id <- sample(unique(spe$sample_id), 3)
cur_images <- images_comp[names(images_comp) %in% cur_id]
cur_masks <- masks[names(masks) %in% cur_id]

plotCells(cur_masks,
          object = spe, 
          cell_id = "ObjectNumber", img_id = "sample_id",
          colour_by = "celltype")

A recommended way of visualizing segmentation quality is to select a cell phenotype of interest and outline the cells on composite images. In the following example, we select CD8+ T cells and visualize them on composite images displaying the marker intensity of CD3 and CD8.

CD8 <- spe[,spe$celltype == "CD8"]

plotPixels(image = cur_images,
           mask = cur_masks,
           object = CD8, 
           cell_id = "ObjectNumber", img_id = "sample_id",
           colour_by = c("CD3", "CD8a"),
           outline_by = "celltype",
                      bcg = list(CD3 = c(0, 5, 1),
                      CD8a = c(0, 5, 1)),
           colour = list(celltype = c("CD8" = "white")),
           thick = TRUE)

For easier visualization the plot can also be written out:

if (!dir.exists("data/CellValidation")) dir.create("data/CellValidation")

plotPixels(image = cur_images,
           mask = cur_masks,
           object = CD8, 
           cell_id = "ObjectNumber", img_id = "sample_id",
           colour_by = c("CD3", "CD8a"),
           outline_by = "celltype",
                      bcg = list(CD3 = c(0, 5, 1),
                      CD8a = c(0, 5, 1)),
           colour = list(celltype = c("CD8" = "white")),
           thick = TRUE,
           save_plot = list(filename = "data/CellValidation/CD8.png"))

Save objects

Finally, the generated data objects can be saved for further downstream processing and analysis.

saveRDS(spe, "data/spe.rds")

Session info

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggridges_0.5.4              forcats_0.5.2              
##  [3] stringr_1.5.0               dplyr_1.0.10               
##  [5] purrr_0.3.5                 readr_2.1.3                
##  [7] tidyr_1.2.1                 tibble_3.1.8               
##  [9] tidyverse_1.3.2             caret_6.0-93               
## [11] lattice_0.20-45             igraph_1.3.5               
## [13] Rphenograph_0.99.1.9003     viridis_0.6.2              
## [15] viridisLite_0.4.1           cowplot_1.1.1              
## [17] scater_1.26.1               scuttle_1.8.2              
## [19] batchelor_1.14.0            BiocParallel_1.32.4        
## [21] tiff_0.1-11                 cytomapper_1.10.0          
## [23] EBImage_4.40.0              patchwork_1.1.2            
## [25] dittoSeq_1.10.0             ggplot2_3.4.0              
## [27] pheatmap_1.0.12             CATALYST_1.22.0            
## [29] imcRtools_1.4.2             SpatialExperiment_1.8.0    
## [31] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
## [33] Biobase_2.58.0              GenomicRanges_1.50.1       
## [35] GenomeInfoDb_1.34.4         IRanges_2.32.0             
## [37] S4Vectors_0.36.1            BiocGenerics_0.44.0        
## [39] MatrixGenerics_1.10.0       matrixStats_0.63.0         
## 
## loaded via a namespace (and not attached):
##   [1] ModelMetrics_1.2.2.2        R.methodsS3_1.8.2          
##   [3] bit64_4.0.5                 knitr_1.41                 
##   [5] irlba_2.3.5.1               multcomp_1.4-20            
##   [7] DelayedArray_0.24.0         R.utils_2.12.2             
##   [9] rpart_4.1.19                data.table_1.14.6          
##  [11] hardhat_1.2.0               RCurl_1.98-1.9             
##  [13] doParallel_1.0.17           generics_0.1.3             
##  [15] flowCore_2.10.0             ScaledMatrix_1.6.0         
##  [17] terra_1.6-47                TH.data_1.1-1              
##  [19] RANN_2.6.1                  future_1.29.0              
##  [21] proxy_0.4-27                bit_4.0.5                  
##  [23] tzdb_0.3.0                  xml2_1.3.3                 
##  [25] lubridate_1.9.0             httpuv_1.6.7               
##  [27] assertthat_0.2.1            gargle_1.2.1               
##  [29] gower_1.0.0                 xfun_0.35                  
##  [31] hms_1.1.2                   jquerylib_0.1.4            
##  [33] evaluate_0.19               promises_1.2.0.1           
##  [35] fansi_1.0.3                 readxl_1.4.1               
##  [37] dbplyr_2.2.1                DBI_1.1.3                  
##  [39] htmlwidgets_1.5.4           googledrive_2.0.0          
##  [41] ellipsis_0.3.2              ggnewscale_0.4.8           
##  [43] ggpubr_0.5.0                backports_1.4.1            
##  [45] cytolib_2.10.0              svgPanZoom_0.3.4           
##  [47] sparseMatrixStats_1.10.0    vctrs_0.5.1                
##  [49] abind_1.4-5                 cachem_1.0.6               
##  [51] withr_2.5.0                 ggforce_0.4.1              
##  [53] vroom_1.6.0                 svglite_2.1.0              
##  [55] cluster_2.1.4               crayon_1.5.2               
##  [57] drc_3.0-1                   recipes_1.0.3              
##  [59] edgeR_3.40.0                pkgconfig_2.0.3            
##  [61] labeling_0.4.2              units_0.8-1                
##  [63] tweenr_2.0.2                nlme_3.1-160               
##  [65] vipor_0.4.5                 nnet_7.3-18                
##  [67] globals_0.16.2              rlang_1.0.6                
##  [69] lifecycle_1.0.3             sandwich_3.0-2             
##  [71] modelr_0.1.10               rsvd_1.0.5                 
##  [73] cellranger_1.1.0            randomForest_4.7-1.1       
##  [75] polyclip_1.10-4             Matrix_1.5-3               
##  [77] raster_3.6-11               carData_3.0-5              
##  [79] Rhdf5lib_1.20.0             zoo_1.8-11                 
##  [81] reprex_2.0.2                beeswarm_0.4.0             
##  [83] RTriangle_1.6-0.11          googlesheets4_1.0.1        
##  [85] GlobalOptions_0.1.2         png_0.1-8                  
##  [87] rjson_0.2.21                bitops_1.0-7               
##  [89] shinydashboard_0.7.2        R.oo_1.25.0                
##  [91] pROC_1.18.0                 ConsensusClusterPlus_1.62.0
##  [93] KernSmooth_2.23-20          rhdf5filters_1.10.0        
##  [95] DelayedMatrixStats_1.20.0   shape_1.4.6                
##  [97] classInt_0.4-8              parallelly_1.33.0          
##  [99] jpeg_0.1-10                 rstatix_0.7.1              
## [101] ggsignif_0.6.4              beachmat_2.14.0            
## [103] scales_1.2.1                magrittr_2.0.3             
## [105] plyr_1.8.8                  zlibbioc_1.44.0            
## [107] compiler_4.2.1              dqrng_0.3.0                
## [109] concaveman_1.1.0            RColorBrewer_1.1-3         
## [111] plotrix_3.8-2               clue_0.3-63                
## [113] cli_3.4.1                   XVector_0.38.0             
## [115] listenv_0.8.0               FlowSOM_2.6.0              
## [117] MASS_7.3-58.1               tidyselect_1.2.0           
## [119] stringi_1.7.8               RProtoBufLib_2.10.0        
## [121] highr_0.9                   yaml_2.3.6                 
## [123] BiocSingular_1.14.0         locfit_1.5-9.6             
## [125] ggrepel_0.9.2               grid_4.2.1                 
## [127] sass_0.4.4                  timechange_0.1.1           
## [129] tools_4.2.1                 future.apply_1.10.0        
## [131] circlize_0.4.15             rstudioapi_0.14            
## [133] foreach_1.5.2               gridExtra_2.3              
## [135] prodlim_2019.11.13          farver_2.1.1               
## [137] Rtsne_0.16                  ggraph_2.1.0               
## [139] DropletUtils_1.18.1         digest_0.6.31              
## [141] lava_1.7.0                  shiny_1.7.3                
## [143] Rcpp_1.0.9                  car_3.1-1                  
## [145] broom_1.0.1                 later_1.3.0                
## [147] RcppAnnoy_0.0.20            httr_1.4.4                 
## [149] sf_1.0-9                    ComplexHeatmap_2.14.0      
## [151] distances_0.1.8             colorspace_2.0-3           
## [153] rvest_1.0.3                 fs_1.5.2                   
## [155] XML_3.99-0.13               splines_4.2.1              
## [157] uwot_0.1.14                 graphlayouts_0.8.4         
## [159] sp_1.5-1                    systemfonts_1.0.4          
## [161] xtable_1.8-4                jsonlite_1.8.4             
## [163] tidygraph_1.2.2             timeDate_4021.107          
## [165] ipred_0.9-13                R6_2.5.1                   
## [167] pillar_1.8.1                htmltools_0.5.4            
## [169] mime_0.12                   nnls_1.4                   
## [171] glue_1.6.2                  fastmap_1.1.0              
## [173] DT_0.26                     BiocNeighbors_1.16.0       
## [175] fftwtools_0.9-11            class_7.3-20               
## [177] codetools_0.2-18            mvtnorm_1.1-3              
## [179] utf8_1.2.2                  bslib_0.4.1                
## [181] ResidualMatrix_1.8.0        ggbeeswarm_0.6.0           
## [183] colorRamps_2.3.1            gtools_3.9.4               
## [185] magick_2.7.3                survival_3.4-0             
## [187] limma_3.54.0                rmarkdown_2.18             
## [189] munsell_0.5.0               e1071_1.7-12               
## [191] GetoptLong_1.0.5            rhdf5_2.42.0               
## [193] GenomeInfoDbData_1.2.9      iterators_1.0.14           
## [195] HDF5Array_1.26.0            haven_2.5.1                
## [197] reshape2_1.4.4              gtable_0.3.1